1,412 research outputs found

    An extended Stein-type covariance identity for the Pearson family with applications to lower variance bounds

    Full text link
    For an absolutely continuous (integer-valued) r.v. XX of the Pearson (Ord) family, we show that, under natural moment conditions, a Stein-type covariance identity of order kk holds (cf. [Goldstein and Reinert, J. Theoret. Probab. 18 (2005) 237--260]). This identity is closely related to the corresponding sequence of orthogonal polynomials, obtained by a Rodrigues-type formula, and provides convenient expressions for the Fourier coefficients of an arbitrary function. Application of the covariance identity yields some novel expressions for the corresponding lower variance bounds for a function of the r.v. XX, expressions that seem to be known only in particular cases (for the Normal, see [Houdr\'{e} and Kagan, J. Theoret. Probab. 8 (1995) 23--30]; see also [Houdr\'{e} and P\'{e}rez-Abreu, Ann. Probab. 23 (1995) 400--419] for corresponding results related to the Wiener and Poisson processes). Some applications are also given.Comment: Published in at http://dx.doi.org/10.3150/10-BEJ282 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Strengthened Chernoff-type variance bounds

    Full text link
    Let XX be an absolutely continuous random variable from the integrated Pearson family and assume that XX has finite moments of any order. Using some properties of the associated orthonormal polynomial system, we provide a class of strengthened Chernoff-type variance bounds.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ484 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    Optimal Piecewise Linear Regression Algorithm for QSAR Modelling

    Get PDF
    Quantitative Structure‐Activity Relationship (QSAR) models have been successfully applied to lead optimisation, virtual screening and other areas of drug discovery over the years. Recent studies, however, have focused on the development of models that are predictive but often not interpretable. In this article, we propose the application of a piecewise linear regression algorithm, OPLRAreg, to develop both predictive and interpretable QSAR models. The algorithm determines a feature to best separate the data into regions and identifies linear equations to predict the outcome variable in each region. A regularisation term is introduced to prevent overfitting problems and implicitly selects the most informative features. As OPLRAreg is based on mathematical programming, a flexible and transparent representation for optimisation problems, the algorithm also permits customised constraints to be easily added to the model. The proposed algorithm is presented as a more interpretable alternative to other commonly used machine learning algorithms and has shown comparable predictive accuracy to Random Forest, Support Vector Machine and Random Generalised Linear Model on tests with five QSAR data sets compiled from the ChEMBL database

    Development of text mining tools for information retrieval from patents

    Get PDF
    Biomedical literature is composed of an ever increasing number of publications in natural language. Patents are a relevant fraction of those, being important sources of information due to all the curated data from the granting process. However, their unstructured data turns the search of information a challenging task. To surpass that, Biomedical text mining (BioTM) creates methodologies to search and structure that data. Several BioTM techniques can be applied to patents. From those, Information Retrieval is the process where relevant data is obtained from collections of documents. In this work, a patent pipeline was developed and integrated intoFEDER -Federación Española de Enfermedades Raras(NORTE-01-0145-FEDER-000004)info:eu-repo/semantics/publishedVersio

    Linear Estimation of Location and Scale Parameters Using Partial Maxima

    Full text link
    Consider an i.i.d. sample X^*_1,X^*_2,...,X^*_n from a location-scale family, and assume that the only available observations consist of the partial maxima (or minima)sequence, X^*_{1:1},X^*_{2:2},...,X^*_{n:n}, where X^*_{j:j}=max{X^*_1,...,X^*_j}. This kind of truncation appears in several circumstances, including best performances in athletics events. In the case of partial maxima, the form of the BLUEs (best linear unbiased estimators) is quite similar to the form of the well-known Lloyd's (1952, Least-squares estimation of location and scale parameters using order statistics, Biometrika, vol. 39, pp. 88-95) BLUEs, based on (the sufficient sample of) order statistics, but, in contrast to the classical case, their consistency is no longer obvious. The present paper is mainly concerned with the scale parameter, showing that the variance of the partial maxima BLUE is at most of order O(1/log n), for a wide class of distributions.Comment: This article is devoted to the memory of my six-years-old, little daughter, Dionyssia, who leaved us on August 25, 2010, at Cephalonia isl. (26 pages, to appear in Metrika

    Subclinical VZV reactivation in immunocompetent children hospitalized in the ICU associated with prolonged fever duration*

    Get PDF
    AbstractA prospective observational study was conducted to examine whether asymptomatic VZV reactivation occurs in immunocompetent children hospitalized in an ICU and its impact on clinical outcome. A secondary aim was to test the hypothesis that vaccinated children have a lower risk of reactivation than naturally infected children. Forty immunocompetent paediatric ICU patients and healthy controls were enrolled. Patients were prospectively followed for 28 days. Clinical data were collected and varicella exposure was recorded. Admission serum levels of TNF-a, cortisol and VZV-IgG were measured. Blood and saliva samples were collected for VZV-DNA detection via real-time PCR. As a comparison, the detection of HSV-DNA was also examined. Healthy children matched for age and varicella exposure type (infection or vaccination) were also included. VZV reactivation was observed in 17% (7/39) of children. Children with VZV reactivation had extended duration of fever (OR = 1.17; 95% CI, 1.02–1.34). None of the varicella-vaccinated children or healthy controls had detectable VZV-DNA in any blood or saliva samples examined. HSV-DNA was detected in saliva from 33% of ICU children and 2.6% of healthy controls. Among children with viral reactivation, typing revealed wild-type VZV and HSV-1. In conclusion, VZV reactivation occurs in immunocompetent children under severe stress and is associated with prolonged duration of fever

    Target identification of Mycobacterium tuberculosis phenotypic\textit{Mycobacterium tuberculosis phenotypic} hits using a concerted chemogenomic, biophysical and structural approach

    Get PDF
    Mycobacterium phenotypic hits are a good reservoir for new chemotypes for the treatment of tuberculosis. However, the absence of defined molecular targets and modes of action could lead to failure in drug development. Therefore, a combination of ligand-based and structure-based chemogenomic approaches followed by biophysical and biochemical validation have been used to identify targets for Mycobacterium tuberculosis phenotypic hits. Our approach identified EthR and InhA as targets for several hits, with some showing dual activity against these proteins. From the 35 predicted EthR inhibitors, eight exhibited an IC50 below 50 μM against M. tuberculosis EthR and three were confirmed to be also simultaneously active against InhA. Further hit validation was performed using X-ray crystallography yielding eight new crystal structures of EthR inhibitors. Although the EthR inhibitors attain their activity against M. tuberculosis by hitting yet undefined targets, these results provide new lead compounds that could be further developed to be used to potentiate the effect of EthA activated pro-drugs, such as ethionamide, thus enhancing their bactericidal effect.GM is grateful to the European Molecular Biology Laboratory and Marie Sklodowska-Curie Actions for funding this work. VM and MB acknowledge Bill & Melinda Gates Foundation [subcontract by the Foundation for the National Institutes of Health (NIH)] (OPP1024021). VM and MS acknowledge the European Community’s Seventh Framework Programme [grant number 260872]. GP would like to acknowledge the Wellcome Trust and the European Molecular Biology Laboratory for funding. JPO was funded by the member nation states of the European Molecular Biology Laboratory. TLB acknowledges The Wellcome Trust for funding and support (grant number 200814/Z/16/Z)

    A document classifier for medicinal chemistry publications trained on the ChEMBL corpus

    Get PDF
    Background  The large increase in the number of scientific publications has fuelled a need for semi- and fully automated text mining approaches in order to assist in the triage process, both for individual scientists and also for larger-scale data extraction and curation into public databases. Here, we introduce a document classifier, which is able to successfully distinguish between publications that are ‘ChEMBL-like’ (i.e. related to small molecule drug discovery and likely to contain quantitative bioactivity data) and those that are not. The unprecedented size of the medicinal chemistry literature collection, coupled with the advantage of manual curation and mapping to chemistry and biology make the ChEMBL corpus a unique resource for text mining.  Results  The method has been implemented as a data protocol/workflow for both Pipeline Pilot (version 8.5) and KNIME (version 2.9) respectively. Both workflows and models are freely available at: ftp://ftp.ebi.ac.uk/pub/databases/chembl/text-mining webcite. These can be readily modified to include additional keyword constraints to further focus searches.  Conclusions  Large-scale machine learning document classification was shown to be very robust and flexible for this particular application, as illustrated in four distinct text-mining-based use cases. The models are readily available on two data workflow platforms, which we believe will allow the majority of the scientific community to apply them to their own data.FWN – Publicaties zonder aanstelling Universiteit Leide

    Evaluation of machine-learning methods for ligand-based virtual screening

    Get PDF
    Machine-learning methods can be used for virtual screening by analysing the structural characteristics of molecules of known (in)activity, and we here discuss the use of kernel discrimination and naive Bayesian classifier (NBC) methods for this purpose. We report a kernel method that allows the processing of molecules represented by binary, integer and real-valued descriptors, and show that it is little different in screening performance from a previously described kernel that had been developed specifically for the analysis of binary fingerprint representations of molecular structure. We then evaluate the performance of an NBC when the training-set contains only a very few active molecules. In such cases, a simpler approach based on group fusion would appear to provide superior screening performance, especially when structurally heterogeneous datasets are to be processed
    corecore